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OPTIMAL  CONTROL  OF  MARKOV  DIFFUSION  PROCESSES 

Wendell  H.  Fleming 

ACCOMPANYING  STATEMENT 

This  is  a concise  summary  !of  recent  results  about  optimal 
control  for  Markov  diffusion  processes,  and  guide  to  recent  literature. 
The  paper  is  to  be  presented  at  the  Joint  Automatic  Control  Confererce 
in  October  1978.  The  emphasis  is  on  completely  observed  diffusions, 
with  briefer  discussion  of  results  for  the  case  of  partial  observations. 
Various  methods  to  deduce  the  basic  dynamic  programming  principle  are 
discussed;  and  some  methods  for  approximate  solution  are  indicated. 
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ABSTRACT 

Some  results  from  optimal  stochastic  rheory  are  surveyed  in  this  paper,  th 
particular  emphasis  on  control  of  diffusion  processes.  Methods  for  obtain- 
ing necessary  and  sufficient  conditions  for  an  optimum  are  obtained,  as  wc.Jl 
as  some  techniques  for  approximate  solution.  A new  application  of  stochas- 
tic control  methods  is  made  to  obtain  Ventcel-Freidlin  type  estimates  for  \:hs 
probability  that  the  states  of  a diffusion  process  remain  in  a given  region 
during  a given  time  period. 

INTRODUCTION 

This  paper  is  intended  as  a concise  survey  of  recent  results  in  the  theor-' 
of  optimal  control  for  Markov  diffusions.  We  mention  results  which  estab- 
lish rigorously  conditions  for  an  optimum,  in  case  of  complete  or  partial 
observations,  as  well  as  some  techniques  of  approximate  solution. 

THE  MODEL 

Consider  a control  system  with  state  space  finite  dimensional  P.!  . subject 

to  random  disturbances  which  are  modelled  as  white  noise.  The  state  at  time 
t is  denoted  by  £(t)  and  the  control  by  u(t)  . The  state  process  obeys 
a stochastic  differential  equation  (Ito  sense) 

d£  = f [£ (t) ,u(t)]dt  + a[5(t)]dw  , .) 

with  w(t)  a brounian  motion  process  of  some  dimension  m , and  with 
u(t)  c U where  U is  a given  "control  space".  The  controller  may  have 
complete  or  partial  information  about  past  system  states.  Various  kinds  ci 
performance  criteria  have  been  considered.  For  instance,  one  may  consider 
(1)  on  a finite  time  interval  0 t T , and  seek  a control  nininiaing  an 
expectation 

J = E{[  L[C(t),u(t)]dt  + m(T)]}  (?'. 

JO 

See  [10,  Chap.  VI].  Another  possible  criterion,  discussed  below,  is  the 
probability  of  exit  from  a given  region  D 

The  white  noise  idealization  in  (1)  implies  that  the  state  process  £(t)  is 
a Markov  diffusion  if  the  control  enters  in  feedback  form  as  u(t)  = 
u(t,Ut))  . In  [4]  it  is  shown  that  certain  stability  and  other  properties 
of  stochastic  control  systems  continue  to  hold  for  wide  band  (approximately 
white)  noise.  If  the  noise  coefficient  a is  not  constant,  care  must  be 
used  in  passing  from  wide  band  to  white  noise.  This  is  related  to  the  mat- 
ter of  Ito  vs.  Stratonovich  sense  interpretation  of  (1)  [10,  pp  126-7]. 
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We  shall  not  review  here  the  considerable  recent  literature  on  jump  Harkov 
processes.  See  for  instance  [5]  [21]  [23]. 

COMPLETELY  OBSERVED  SYSTEM  STATES 

Consider  the  problem  of  minimizing  a criterion  J of:  type  (2),  in  which  th 
controller  can  observe  the  state  C(t)  . The  theory  is  in  a rather  icuLnro 
state  for  this  problem.  To  a considerable  extent  it  is  based  on  dynamic 
programming  methods.  Let  x = 5(0)  , and  let  V = V(x,T)  denote  the  minimum 

of  J . Earlier  rigorous  treatments  of  the  dynamic  programming  method  re- 
lied on  the  fact  that  V is  a smooth  solution  of  the  Bellman  equation, 
which  is  a uniformly  parabolic  second  order  partial  differential  equation  if 
the  problem  is  nondegenerate.  By  nondegenerate  is  meant  that  the  symmetric 
matrices  c(x)a'(x)  have  eigenvalues  bounded  below  by  some  c > 0 . See 

[10,  Chap.  VI],  and  the  more  complete  development  in  the  new  book  .14] . 

More  recently,  other  techniques  have  been  developed  to  justify  the  dynamic 
programming  principle  without  appealing  to  the  theory  of  parabolic  partial 
differential  equations.  A purely  probabilistic  method,  relying  heavily  cn 
the  Girsanov  transformation  for  measures  and  martingale  representation 
theorems  was  used  in  [6].  Related  ideas  were  developed  further  in  [2]  [7], 
and  for  the  average  cost  per  unit  time  problem  in  [16].  An  elegant  semi- 
group approach  was  used  in  [19].  For  simplicity  let  L = 0 in  (2),  and 
./rite  V(x,T)  = ST'i'(x)  . Then  { ST ] is  a noniinear  semigroup,  acting  on 
functions  f . The  method  in  [19]  is  to  construct  this  semigroup  airectlv 
by  a suitable  monotone  sequence  of  approximations  which  arc  piecewise  cor- 
stant  in  time. 

PARTIALLY  OBSERVED  SYSTEM  STATES 

Suppose  that  the  controller  can  observe  n{t)  , which  satisfies 

dn  = g[f](t)]dt  + °]^wl  (3) 

with  w1  a brownian  motion  independent  of  w and  n(0)  = 0 . From  a 
practical  viewpoint,  the  most  important  result  is  the  classical  separation 
principle  in  case  of  linear  state  and  observation  equations  (1) , (3) . See 
[10]  [15]  . There  remains  a technical  issue  in  connection  with  the  separa- 
tion principle,  concerning  the  class  of  controls  admitted  [15]  [22,  . For 

nonlinear  systems,  general  necessary  and  sufficient  conditions  for  optimali- 
ty have  been  given  [6]  [7]  [13]  [20].  However,  it  seems  difficult  to  net 

practically  useful  information  about  the  solution  from  those  conditions. 

Another  point  of  view  is  the  following.  Let  tt-*-  denote  the  conditional 
distribution  of  5(t)  given  n(s)  for  0 <_  s < t . Even  in  the  nonlinear 
case  a kind  of  "separated"  control  problem  can  be  introduced,  in  which  the 
state  is  it  (regarded  as  completely  observed).  If  S(t)  is  n finite- 
state  controlled  Markov  chain,  rather  than  a solution  to  (1),  and  the  obser- 
vation process  obeys  (3),  then  the  separated  problem  is  itself  a finite  di- 
mensional diffusion  [3]  [23].  When  5 (t)  obeys  (1)  and  n(t)  obeys  (3), 
the  conditional  distribution  ti  is  a measure-valued  process  obeying-  the 
nonlinear  filter  equation  [17,  tChap.  3],  The  relation  between  the  separat- 
ed and  original  problem  with  partial  observations  in  this  case  is  a topic  of 
current  research. 

approximate  solutions 

We  return  to  controlled  diffusions  with  complete  observations.  Explicit  so- 
lutions to  the  problem  of  minimizing  J are  available  in  few  instances. 

The  best  known  example  is  the  linear  regulator;  another  is  the  portfolio  se- 
lection problem  [10,  p.  160,  166]  . One  method  of  approximate  solution  is  by 
discretization.  This  replaces  the  Bellman  equation  by  difference  equations, 
which  are  the  dynamic  programming  equations  for  a corresponding'  controlled 
Markov  chain  [15,  Chap.  9],  A quite  different  kind  of  approximation  method, 
in  case  the  noise  coefficient  a is  small,  was  described  in  [8],  A related 
perturbation  technique  was  applied  to  a resource  management  problem  in  [13] . 
A method  for  approximate  solution  to  nonlinear  perturbations  of  the  stochas- 
tic linear  regulator  was  given  in  [24]  . If  the  perturbation  of  the  scate 
dynamics  is  polynomial  in  the  state,  then  the  approximation  can  bo  implement- 
ed knowing  only  higher  order  moments  of  the  (Gaussian)  solution  to  the  lin- 
ear regulator. 


MINIMUM  EXIT  PROBABILITY 


Let  D be  a given  region  in  Rn  , with  initial  state  5(0)  in  0 . V.'e 
say  that  exit  occurs  if  £(t)  reaches  the  boundary  of  D during  the  time 
interval  0 <_  t <_  T . Instead  of  (2)  we  may  take  the  cxiv.  probability  as 
criterion  to  be  minimized.  This  is  reasonable  if  D is  r'.jf. rded  as  a re- 
gion in  which  the  system  operates  acceptably. 

The  following  asymptotic  estimate,  for  low  noise  intensities,  was  given  in 
[11] . Let  c = /el  , with  I the  identity  matrix;  and  let  q-  denote  the 
minimum  exit  probability.  Then  -e  log  qe  ->■  J°  as  e -*•  0 , where  3°  is 

the  lower  value  of  a certain  differential  game.  This  is  analogous  to  an  es- 
timate of  Ventcel-Freidlin  tvne  for  uncontrolled  diffusions  [12,  Chap.  1 A } 

[9]  . 

OTHER  PROBLEMS 

Among  optimization  problems  for  diffusions  which  we  have  not  discussed  ar * 
optimal  stopping  [12]  [14]  [15],  and  impulsive  control  [1].  Finally,  we 
should  mention  techniques  of  variational  inequalities  and  quasivariational 
inequalities  [1],  which  provide  another  framework  in  which  tc  study  r.  broad 
class  of  optimization  problems. 
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